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ABSTRACT 



A simpUfied senuphoie method and ^)paratus for simulta- 
neous execution of multiple semaphoore instracdons and for 
enforcement of neoessaxy ordering. A central processing unit 
having an instructioo pipeline is coupled with a data cache 
arrangement including a semaphore buflfer, a data cache, and 
the semaphore execution unit An initial semaphore instruc- 
tion having one or more operands and a semaphore address 
are transmitted from the instruction pipeline to the sema- 
phore buffer, which in turn are transmitted from the sema- 
phore buffer to the semaphore execution unit The sema- 
phore address of the initial semaphore instruction is 
transmitted from the instruction pipeline to the data cache to 
retrieve initial semaphore data stored within the data cache 
at a location in a data line of the data cache as identified by 
the semaphore address. The semaphore instruction is 
executed within the semaphore execution unit by operating 
upon the initial semaphore data and the one or more sema- 
phore operands so as to produce processed sem^hcre data, 
whidi is then stcned within the data cache. Since the 
semaphore buffer provides for entries of multiple sem^fhore 
instractiona, the semaphore buffer initiates simultaneous 
execution of nmlt^le semaphore instracdons, as needed. 

9 Chims, 2 Dfawing Sheets 
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APPARATUS AND METHOD USING A 
SERIAPHORE BUFFER FOR SEMAPHORE 
INSTRUCTIONS 

FIELD OF THE INVENTION 

The present invention relates to computer systems, and 
more particularly relates to a computer system implemen- 
tation for providing efficient management of semaphore 
operations in a multiprocessQr system. 

BACKGROUND OF THE INVENTK)N 

In a multqnocessor system, many processors require 
access to data tiiat is shared by flie processors, soializhig 
precautions are not tafcen, there is a possibility of **data 
tearing" when one processor writes to an area of memory 
^Kliile it is being read by anotha processor. If this situation 
exists, it Is possible that &e processor reading the memory 
actually reads the "torn" pieces of two separate data values. 

Sem^hore instrucdoos are serializing instructions used 
to acquire shared data in the multiprocessor system, without 
any *'data teiaring**. Semaphore operations earned out by 
e^cecution of these instructions provide atomic lead-modify- 
wnte, ordered memoiy accesses. 

In some multiprocessor systems, semaphore operations 
use to atomic access a main memoiy. Although various 
processors of the system usually share access to flie main 
memoiy over a ^stem bus, a processor locks the system bus 
while pecformittg the semaphore operations to provide for 
the atomic accesses to main memory. System peifoimanoe is 
degraded because the system bus is locked, it is not 
available for use by odier processors of the system. 

Additionally, in some systems, semaphore instructions are 
executed by a processor's instruction pipeline. However, 
hardware needed for execution of some sophisticated sema- 
phore instructions adds undesirable complexity to the pipe- 
line. Ftartfaeamore, performance is also affected negatively 
due to resource conflicts between the semaphore instructions 
and other instructions executing in the pipeline. In particular, 
multiple semq^hcre instructions can not execute in the 
pipeline simultaneously. 

What is needed is a sinq>]ified semaphore method and 
sqpparatus for simultaneous execution of multiple semaphore 
instructions and far enforoenient of necessary ordering. 

SUMMARY OF THE INVENTION 

The present inventiott provides a simplified semaphore 
method and apparatus for simultaneous execution of mul- 
t^le semaphore instructioiis and for enforcement of neces- 
sary ordering. As discussed in further detail subsequently 
hmin, a senuiphore bnfifer of the present invention sLrapli- 
lies support of semaphore instructions by providing for the 
semaphore instructioiis (including fairly sophisticated ones 
such as Fetch-and-Add or Compare-and-Swap) to be 
executed within a data cache arrangement, rather tlm within 
a central processing unit pipeline, and Yjy allowing multiple 
semaphore instructions to execule simultaneously, provided 
diat the samnltaneous executicMi is consistent with necessary 
ordering enforced for these semaphore instructions. 
Fiutheimore, executing semaphore instructions within the 
data cadie arrangement of the invention enhances system 
performance because sem^hore operations do not use 
atomic accesses to main memory, and die system bos is not 
locked during semaphore operations. 

Briefly, and in general terms, the invention includes the 
central processing unit having the instruction pipeline 
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coupled with the data cache arrangement, which includes a 
semaphore buffer, a data cache, and a semaj^iore execution 
unit For exan^le, an initial sem^hore instruction having 
one or more operands and a semaphore address is transmit- 

5 ted from the instruction pipeline to the sem^ore buffer, 
which in turn is transmitted from the semaphore buffer to the 
sem^hore execution unit. The semaphore address of the 
initial semaphore instruction is transmitted from the iostnic- 
tion p^wline to the data cache to retrieve initial semq>hore 

10 data stored wifliintiie data cache at a location in a data line 
of the data cache as identified by the semq)hore address. For 
most semaphore instructions, this initial semaphore data is 
returned to the p^)eline, in order to iq)date destination 
registers. 

15 One of the semaphore buffer's functions is to store the 
semaphore instniction's operands and wait for a read part of 
the cache memory access to complete. Upon con^>letion of 
the read part of the data cache access, the initial sem^hore 
data is transmitted fr<Hn the data cadie to the sem^hore 

^ execution unit The semaphore instruction is executed 
within the semaphore execution unit by operating i^n flie 
initial sema^^iore data and the one or more serm^hore 
operands so as to produce processed semaphore data, which 
is then stored within the data cache at the location in the data 

25 line identified by the semaphore address of the initial 
sems^hore instructioiL Since the senu^ore buffer provides 
for entries of multiple semaphore instructions, flie sema- 
phore buffer initiates simultaneous execution of multiple 
senupbore instmctions, as needed. 

In general, the semiqf^ores* data cache accesses are 
Read-Modfify-Wrlte. As die semaphore operation is a Read- 
Modify-Write operation, another semaphore buffer function 
is to monitor for dependency conditions and to enforce 
necessary ordering of some semaphore instructions. A 
dependency condition is detected when there is a depen- 
dency between a previous store and a subsequent load. As 
mentioned, the loads are usually executed ahead of previous 
stores in high performance processors, so flie detection of 
dependency conditions is crudal for the correct operation of 
the processor. No store-fetch bypass can be done in this case, 
because the semaphore's store data is available only much 
later, so detection of this dependency condition by control 
logic of the semaphore buffer causes it to interlock the 
pipeline. 

Other aspects and advantages of the present invention will 
become appar^t from the following detailed desct^tion, 
taken in conjunction with the acconqianying drawings, 
illustrating by way of example the principles of tiie inven- 

50 

BRIEF DESCRIPTION OF THE DRAWING 

FIG. 1 is a simplified partial block diagram of a preferred 
embodiment of die invention. 
55 FIG. 2 is a detailed partial block diagram of a semaphore 
buffer shown in the preferred embodiment of FIG. 1. 

DHTAHfD DESCRIFnON OF PREFERRED 
EMBODIMENT 

60 HG. 1 is a simplified partial block diagram of a pref emd 
embodiment of the invention. A multiprocessor system 100 
enploys a plurality of preferred cache arrangements 101 of 
the invention for efficient management of semaphore opera- 
tions in the multiprocessor system. Eadi cache arrangement 

65 provides high-speed memory coupled between main 
memory 103 and a respective first and second central 
processing unit (CPU) 1<I5, 106. A system bus 122 is 
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coDtrolled by a respective bus controlled interface 123 
coufded with each of the central processing units. 

Each central processing unit has a respective insttuction 
pipeline coupled with the respective data cache 
arrangement, which include a semaphore buffer 132, a data 
cache 134 of high speed menxxy, and a semaphore execu- 
tion unit 136. An initial semaphore instruction haying one or 
more operands and a semaphore address are transmitted 
fr<Hii the instruction pipeline to tiie semaphore buffer 132^ 
wUdi in turn is transmitted from the semaphore buffer 132 
to the semai^Qre execution unit 136. The senoaphore address 
of the initial semq)hQre instniction is transmitted from the 
instruction pipeline to the data cache 134 to retrieve an 
initial sem^ihose data stored within the data cache at a 
location in a data line 138 as identified by the semaphore 
address. 

Upon completion of the read part of the data cache access, 
the initial semaphore data is transmitted from the data cache 
134 to the semaphore execution unit 136. The semaphore 
instruction is executed within the semaj^ore execution unit 
136 by operating i]^n the initial semaphore data and the one 
or more semaphore operands so as to inx)duce processed 
semaphore data, which is stored within the data cache at the 
locati<Mi in the data line 138 identified by the sema{^ore 
address. It should be understood that although operation of 
the invention upon the initial semaphore instruction is 
discussed in detail herein, the semaph(xe buffer 132 of the 
invention provides for entries of multiple semaphore 
instructions, and initiates simultaneous execution of mul- 
Hflt sem^ore instructions, as needed. 

FIG. 2 is a detailed partial block diagram the sema- 
phore buffer 132 of the preferred onbodiment shown in FIG. 
1. As shown in FIO. 2, the semaphore buffer 132 of the 
inventioa provides four entriies, which provide far initiating 
simultaneous execution oi four sem^hore instructions. In 
alternative embodiments, two entries, three entdes» or more 
than four entries are provided for simultaneous execution d 
a different number of semaphore instructions. In the pre- 
fesred embodiment, each of the entries provides fos storage 
of respective sem^hoce entries, first operands, second 
operands, destination registers nunibers, and semaphore 
addresses. 

Control logic 210 of the semai^ore buffer 132initiates the 
execution of the semaphore instructions, once their initial 
data is returned by the data cache. In the prefeired 
embodiment, the control logic initiates simultaneous execu- 
tion of semaphore instructions by using the semaphore 
buffer to store a subsequent semaj^ore instniction and to 
initiate execution of the subsequent semaphore instruction 
before the processed semaphore data of the initial sema- 
phore instniction is stored in the data cadie, subject to 
ordering constraints. 

Additionally, the control logic 210 of the semaj^ore 
buffer generates the interlock signal, as needed, to enforce 
necessary dependency and ordering of some instructions. In 
the preferred embodiment of the invention, upon transmit- 
ting to the semaphore buffer 132 a subsequent semaphore 
instruction having a sem^hore address, comparator dr- 
cuitiy in the control logic 310 of the sem^ore buffer 132 
generates an interlock signal if the processed semaphore 
data of the initial semq>hore instruction is not yet stored in 
the data cadie 134, and if the data line 138 identified by the 
semaphore address of the initial semaphore instruction is the 
same as a data line 138 identified by the semaphore address 
cf the subsequent semaphore instruction. 

Similaily, upon transmittuig to the semaphore buffer 132 
a subsequent load or store instruction having an address, the 



semaphore buffer generates the interlock signal if the pro- 
cessed semaphore data of the initial semaphore instruction is 
not yet stored in the data cadie 134, if the data fine identifled 
by ttie semaphore address of the initial semaphore instruc- 

5 tion is the same as a data fine identified by the address of the 
subsequent load or store instruction, and if there is any byte 
overlap between the locationin the data line identified by the 
semaf^ore address of the initial semapbore instraction and 
a location in the data line, as identified by the address of the 
subsequent load or store instraction. In any case, the instruc- 
tion pqwline is interlocked in response to the interlock signal 
until Reprocessed semq)hore data of the initial sem^hoie 
instruction is stored in the data cache 134. 

FUrtiiennote, in accordance with the principles of the 
invention, if the semaphore instruction is a Fetch-and-Add 
instruction the one or more operands includes a first 
operand, and the step of executing the sem^hore instniction 
indudes adding the first operand to the initial semaphore 
data so as to produce the processed semaphore data. 
Additionally, if the semaphore insttuction is a Compare- 

20 and-Bxchange instruction, fiien the one or more operands 
indudes a first operand and a seccmd operand and the step 
of executing fiie sem^ore instruction indudes oonq>aring 
the first operand to the initial semaphore data so as to 
detennine whetiier tiie first operand value is equal to the 

25 initial semaphore data. The stq> of storing the processed 
semaphore data includes storing the second operand as the 
processed semaphore data ff the first operand is equal to the 
Initial semaphore data. If thefirst q[>erandis not equal to the 
initial semaphore data, then tiie initial semaphm data 

30 becomes die processed semaphore data without being 
changed, and there Is no need to store the processed sema- 
phore data. However, it should be understood that to provide 
ease of hardware in](>lenientation, in some embodiments, the 
processed semaphore data is stored regardless of whether or 

35 not it is different firom the initial semaphore data. 

The present invention provides a simplified sem^hore 
method and apparatus for simultaneous execution of mul- 
tiple semaphore instnictions and for enforcement of neces- 
sary ordering. Aldiough specific embodiments of the inven- 

^ tion have been described and illustrated, the invention is not 
to be limited to the specific forms or arrangements of parts 
so described and illustrated, and various modifications and 
changes can be made without dq>arting from the scope and 
spirit of the invention. Within the scope of the appended 

^ claims, therefor, tiie invention may be practiced otherwise 
than as specifically described and illustrated. 
What is claimed is: 
1. A method comprising the steps of: 
providing a central processing unit having an instruction 
pipeline; 

providing a data cadie arrangement induding a sema- 
phore buffer, a data cache, and a semaf^ore execution 
unit; 

transmitting an initial semaphore instruction having one 
or more semaphcse operands and a semaphore address 
from the instruction pipeline to the semaphore buffer; 
transmitting the semaphOTe address of the initial sema- 
phore instruction to the data cache; 
^ re^eving initial semaphore data stored within the data 
cadie at a location in a data line as identified by the 
senuq>hcre address of the initial semaphore instruction; 
transmitting the initial sem.^hore data from the data 
cache to the semaphore execution unit; 
65 transmitting the initial semaphore instraction and the one 
. or more semaphore operands from the semaphore 
bufibr to the sero^thore execution unit; 
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executing the initial semaphore, instruction within the 
semaphore execution unit t>y operating upon the initial 
semaphore data and the one or more semai^ore oper- 
ands so as to produce processed semaphcare data; and 

completing execution of the initial semaphore instniction 
by staring the pioces sed scm^hore data wiadn the data 
cadie at the locatLon in Ac data line identified by the 
semaphore address of the initial semaphcnre instruction. 

2. A metiiod as in daim 1 further conqnising: 
transmitting a subsequent semaphore instruction having a 

sem^hore address from the instruction ptpdine to the 
sem^hore buffer; and 

using tiie semaphore buffer to generate an interlock signal 
if the processed semaphore data of the initial sema- 
phore instruction is not yet stored in tihe data cache and 
if the data line identifled by the semaphore address of 
the initial semaphore instruction is the same as a data 
line identified by the semaphore address of the subse- 
quent semaphore instruction; 

interlocking the instniction pipeline in response to the 
interlock signal until the processed semaphore data is 
stared in the data cache. 

3. A method as in daim 1 further comprising: 
transmitting a subsequent load instniction having an 

address firom fiie instruction pipeline to the semaf^ore 
buffer; and . 

using the semaphore buffer to generate an intedock signal, 

if the processed semaphore data of the initial sema- 
phore instruction is not yet stored in the data cache, ^ 

if the data hue id^itified by the semaphore address of 
the initial semaphore instruction is the same as a data 
line identified by die address of the subsequent load 
instraction, and 

if there is any byte overlap between the location in the 
data line identified by the semaphore address of the 
initial semaphore instruction and a location in the 
data line as identified by the address of the subse- 
quent load instruction; and 
interlocking fte instniction pipeline in response to the 

interlock signal until the processed semaphore data is 

stored in the data cadie. 

4. A method as in claim 1 further comprising: 
transmitting a subsequent store instruction having an 

address firom the instruction pipcShit to the semaphore 
buflfer; and 

using the semaphore buffer to generate an intedock signal, 

if the processed sema^rfiore data of the initial sema- 
phore instruction is not yet st(ved in the data cache, 

if the data line identified by the semaphore address of 
the initial semaphore instruction is the same as a data 
tine identified by the address of die subsequent load 
instruction, and 

if fiiere is any byte overlap between the location in the 
data line identified by the semaphore address of the 
initial semaphore instruction and a location in the 
data line as identified by the address of the subse- 
quent load instruction; and 
interloddng the instruction pipeline in response to the 

interlock sigiial until the processed semaphore data is 

stored in the data cache. 

5. A method as in daim 1 wherein: 
file scooiai^iore instniction is a fetch and add instruction; 
the one or more operands incUides a first operand; and 
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the step of executing the semaphore instniction includes 
adding the first operand to the initial semaphore data so 
as to produce the processed semi^oie data. 

6. A method as hi daim 1 wherehi: 

the semaphore instruction is a conipaie and exchange 
instruction; 

the one or more operands indudes a first operand and a 
second operand; 

the stq> of executing the semaphore instniction indudes 
comparing the first operand to the initial semaphore 
data so as to determine whetiier the first operand value 
is equal to the initial semaphore data; and 

the st^ of storing the processed semaphore data indudes 
storing the second operand as the processed semaphore 
data if the first operand is equal to the initial semaphore 
data. 

7. A method comfvising the steps of: 

providing a data cadie arrangement indudlng a sema- 
phore buffer, a data cache, and a semaphore execution 
unit; 

storing in the semq)hore buffer an initial semaphcxe 
instruction having a sraxtaphore address; 

retrieving initial semaphore data stored within the data 
cadie; 

executing the ioidal sema^ore instruction within the 
execution unit by operating upon the initial semaphore 
data so as to produce processed semaph<»e data; and 

initiating simultaneous execution of semaphore instruc- 
tions by using (he semaphore buffer to store a subse- 
quent semafdiare instnuiion and to begin execution of 
the subsequent semaphore instruction before the pro- 
cessed semaphore data of the initial semqihore instruc- 
tion is stored in fiie data cache. 

8. Amelhod as in daim 7 further comprising the step of 
completing execution of the initial semaphore instruction by 
storing the processed semaphore data within (he data cadie. 

9. An apparatus conqvising: 

a central processing unit induding an instruction pipeline 
for providing a semaphore instruction having one or 
more semaphore operands and a sem^hore address; 
and 

a data cadie arrangement induding: 

a semaphore buffer coupled with the instruction f»pe- 
line for recdving die semaphore instruction and the 
semaphore operand; 

a data cache coupled with the instruction pipeline for 
receiving die semaphore address and for providing 
initial semaphore data stored within the data cache in 
a data Ime as identified by the semaphore address; 

a semaphore executicm unit coupled with the sema- 
pti<xc buffer for recdving the scnuphore instruction 
and the semiqphcre operand, and coupled with die 
data cadie for recdving the initial semaphore data, 
the semaphore execution unit txing adapted for 
executing the sem^hore instruction by operating 
upon die initial semaphore data and the one or more 
operands so as to produce processed semaphore data 
and adqyted for storing fiie processed semiphore 
data bade in the data line identified by the semiphore 
address. 
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